Tip: This page is generated from a Jupyter notebook, some of the code are hid under the hood, some of them can be shown by clicking the button Show Code. If you want to visit the complete notebook, please click the view on github button above.

Introduction

should the scope would be to show the impact in global economy or how has affected Denmark in comparison with major countries how we define major?

We can say we want to show the impact for several country not specifically Denmark, for major country is just some country that we choose (data can be easily found :P)

Undoubtedly the recent appearrance and expansion of COVID-19 virus has affected the lives of billions of people worldwide is many aspects. Goverments have been under constant challenge to reduce social interaction in order to mitigate the possibilities of virus transmission. Therefore, they have introduced hard measurements to face this severe situation which have significant impact to every body's live.

Economy was the first area that affected from those measurements. The work culture had to change to meet the derivative of the goverments, which led companies to move faster towards digitilisation. As a result companies that weren't eager in such changes to face important financial issues forcing them in many cases to reduce their human resources. For other companies such travelling agencies or copmanies in hospitalitty sector, the hit was even harder since they rely their profits entirely on the people's need for entertainment, social exploration etc.. Therefore, they have completely or partially shut down their operation leading many people in unemployment.

The above constitutes common observations and may look discouranging and demotivating facts for many people. However, we can not conclude how big this impact is in each country's overall economy without an in depth investigation of actual facts.

Upon that, we came to the desicion to analyse data from microeconomic and macroeconomic point of view in order to get a more clear understanding of how the virus has affected our economy.

To sum up, from this study we aim to provide a clear conclusion about the economic consequences due to COVID-19 which will be based on analysis of reliable sources. Through interactive and annotated graphs we want to give to the intendent audience all the information needed in order to understand the impact of COVID-19 in economy in a simple and concine manner.

Data Analysis

In the study we will analyse data from all the countries directly affected from COVID-19 giving more focus though in Denmark. We will start the study by presenting a statistical analysis of how the situation with regards to COVID-19 looks like in the most major countries. Then we will include financial data to explore whether there is a significant impact of the virus in our economy and which countries specifically have affected the most. In order to carry out the analysis we will use data from IMF, OECD and other sources which can be found at the end of the page. The reason we chose those datasets was that we believe they contain all the information needed to obtain the required outcome about the fincanial impact of COVID-19.

COVID-19 analysis

In this section, we will dive more into COVID-19 data to present the current situation of virus by illustrating the the numbers of confirmed and death cases across major countries. Then with help of interactive represenation of those numbers we will try to understand the spread rate and distribution of COVID-19.

In the following table is shown a sample of the data regarding COVID-19. The dataset contains columns with the countries, confirmed and recovered cases as well as overall deaths per country.

Date Country Confirmed Recovered Deaths
20004 2020-05-07 West Bank and Gaza 375 176 2
20005 2020-05-07 Western Sahara 6 5 0
20006 2020-05-07 Yemen 25 1 5
20007 2020-05-07 Zambia 153 103 4
20008 2020-05-07 Zimbabwe 34 5 4

Exploration analysis

In this section we will perfrom a basic statistical analysis of the data in order to identify how the data are distibuted among the columns and to detect any important patterns that might be usefull in the further on analysis.

First, we will start by illustating the descriptive statistics of our dataset. In this way we can summarize the central tendency, dispersion and shape of our dataset's distribution.

In the table below it can be observed the great differences in the max values among the cases. The standard deviation is quite high in all the presented cases which means that our data is spread out. The 25th and 50th percentile for recovered and deaths cases is zero while the 75th percentile is 17 and 3, respectively. do you think that make sense

Yeah!

Confirmed Recovered Deaths
count 2.000900e+04 20009.000000 20009.000000
mean 4.936414e+03 1377.999350 325.218252
std 3.911299e+04 9241.404398 2705.461879
min 0.000000e+00 0.000000 0.000000
25% 0.000000e+00 0.000000 0.000000
50% 7.000000e+00 0.000000 0.000000
75% 3.280000e+02 31.000000 6.000000
max 1.257023e+06 195036.000000 75662.000000

In the three figures below is illustrated how the cases distributed across the countries (for sake of simplicity and space only the countrie with less than 1000 deaths are illustrated).

In the figures is illustrated the maximum values of the cases for the corresponding countries in order to identify which countries have recorded the highest numbers of confirmed, recovered and death incidents due to COVID-19. By narrowing down to top five countries we can see that France, Spain, USA, Italy and the United Kingdom have had the higher number of confirmed cases as well as deaths. While recorded recovered cases for the top five countries includes the USA, Italy, Spain, Germany and China. By looking at the deaths it is remarkable how many deads more have the top 5 affected countries from the rest.

# collapse-show

group = full_clean_data.groupby('Country')['Deaths','Confirmed','Recovered'].max().sort_values(by=['Deaths','Confirmed','Recovered'])
group = pd.DataFrame(group)
group = group.reset_index()
# keep only the countries with more than 10000 deaths
new_group = group.query("Deaths >= 1000")


#define colors
red = alt.value('#f54242')
green = alt.value('#137E2A')
black = alt.value('#050404')

#presenting the confirmed cases per country
bars = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Confirmed:Q',
    y=alt.Y("Country:O", sort='-x'),color = red
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Confirmed:Q',color =black
)

bars2 = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Recovered:Q',
    y=alt.Y("Country:O", sort='-x'),color=green
)


text2 = bars2.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Recovered:Q',color=black
)

bars3 = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Deaths:Q',
    y=alt.Y("Country:O", sort='-x'),color=black
)


text3 = bars3.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Deaths:Q',color=black
)



laydermap = (bars + text).properties(width= 250,height=300)|(bars2+text2).properties(width= 250,height=300)|(bars3+text3).properties(width=250,height=300)
laydermap.configure_axis(grid=False).configure_view(strokeWidth=0)

Looking at the basic distribution of the data it is clearly observed across all countries we have decided to focus on few major countries in order to make our analysis more robust. Therefore in the rest of study we will put more focus on the following countries:

  • China: Because it is the place where the Covid-19 presented for the first time.
  • Denmark: This is the country where this study carried out.
  • USA, UK, Italy, Spain, France: These countries constitute the most affected ones by the pandemic.

Further data exploration and preparation

I was thinking to exclude Iran from the major countries and use only the rest

In order to extract more information as possible from the dataset it is necessary to combine several datasets. By doing so, we include columns referring to daily new cases, new deaths and new recovered cases. Other, than that an investigation for missing values and treatment of those it is also a requirement to bring the dataset in form ready for analysis. In the present study the missing values were filled with zeros. It considered the best way to treat such a values because if for example the missing values were filled with the mean, mode or median could lead to false interpration of the results.

In the following tables it is shown first a sample of the final dataset about COVID-19 after the preprossesing and secondly the descriptive stastics of the dataset.

# collapse-show
# data processing to create Active, New cases, New deaths, New recovered
full_clean_data['Active'] = full_clean_data['Confirmed'] - full_clean_data['Recovered'] - full_clean_data['Deaths']

countries = ['US', 'Italy', 'China', 'Spain', 'France', 'United Kingdom', 'Denmark']
selected_data = full_clean_data[full_clean_data['Country'].isin(countries)]

for i in selected_data.index:
    date = selected_data.loc[i, 'Date']
    country = selected_data.loc[i, 'Country']
    date = datetime.strptime(date, '%Y-%m-%d')
    yesterday = datetime.strftime(date - timedelta(1), '%Y-%m-%d')
    yesterdayData = selected_data.loc[(selected_data.Date == yesterday) & (selected_data.Country == country)]
    if len(yesterdayData) <= 0:
        selected_data.loc[i, 'New cases'] = 0
        selected_data.loc[i, 'New deaths'] = 0
        selected_data.loc[i, 'New recovered'] = 0
        continue
    yesterdayData = yesterdayData.iloc[0]
    selected_data.loc[i, 'New cases'] = selected_data.loc[i, 'Confirmed'] - yesterdayData.Confirmed
    selected_data.loc[i, 'New deaths'] = selected_data.loc[i, 'Deaths'] - yesterdayData.Deaths
    selected_data.loc[i, 'New recovered'] = selected_data.loc[i, 'Recovered'] - yesterdayData.Recovered

selected_data = selected_data.fillna(value=0)
selected_data['New cases'] = selected_data['New cases'].astype(int)
selected_data['New deaths'] = selected_data['New deaths'].astype(int)
selected_data['New recovered'] = selected_data['New recovered'].astype(int)
Date Country Confirmed Recovered Deaths Active New cases New deaths New recovered
19884 2020-05-07 France 174918 55191 25990 93737 694 178 1112
19907 2020-05-07 Italy 215858 96276 29958 89624 1401 274 3031
19979 2020-05-07 Spain 221447 128511 26070 66866 1122 213 2509
19995 2020-05-07 US 1257023 195036 75662 986325 28420 2231 5126
19999 2020-05-07 United Kingdom 207977 970 30689 176318 5618 539 36
Confirmed Recovered Deaths Active New cases New deaths New recovered
count 7.430000e+02 743.000000 743.000000 743.000000 743.000000 743.000000 743.000000
mean 8.177973e+04 19688.374159 6534.983849 55556.375505 2935.218035 255.889637 751.129206
std 1.797253e+05 34388.425325 12174.092433 146289.892228 6662.228370 480.369415 1914.035429
min 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.150000e+01 2.000000 0.000000 8.000000 0.000000 0.000000 0.000000
50% 7.783000e+03 276.000000 309.000000 2357.000000 150.000000 8.000000 13.000000
75% 8.391050e+04 27572.500000 6029.500000 54896.500000 3066.500000 347.000000 825.000000
max 1.257023e+06 195036.000000 75662.000000 986325.000000 36188.000000 2612.000000 33227.000000

# collapse-show

group2 = selected_data.groupby('Country')['New deaths','New cases','New recovered'].max().sort_values(by=['New deaths','New cases','New recovered'])
group2 = pd.DataFrame(group2)
group2 = group2.reset_index()
# keep only the countries with more than 10000 deaths
new_group2 = group2#.query("New deaths >= 1000")


#define colors
red = alt.value('#f54242')
green = alt.value('#137E2A')
black = alt.value('#050404')

#presenting the confirmed cases per country
bars = alt.Chart(new_group2).mark_bar(size=5).encode(
    x='New cases:Q',
    y=alt.Y("Country:O", sort='-x'),color = red
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='New cases:Q',color =black
)

bars2 = alt.Chart(new_group2).mark_bar(size=5).encode(
    x='New recovered:Q',
    y=alt.Y("Country:O", sort='-x'),color=green
)


text2 = bars2.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='New recovered:Q',color=black
)

bars3 = alt.Chart(new_group2).mark_bar(size=5).encode(
    x='New deaths:Q',
    y=alt.Y("Country:O", sort='-x'),color=black
)


text3 = bars3.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='New deaths:Q',color=black
)



laydermap = (bars + text).properties(width= 250,height=300)|(bars2+text2).properties(width= 250,height=300)|(bars3+text3).properties(width=250,height=300)
laydermap.configure_axis(grid=False).configure_view(strokeWidth=0)

Overview of COVID-19 current distribution worldwide

Now, we would like to illustrate how Covid-19 has been distributed among the analysed countries. In the first graph plot is illustrated the relation between confirmed and death cases from the day the first diagnosed case and up to now. By scrolling the slide bar under the plot it can be oserved the increase on deaths per day. It is very interesting how many more deaths compare to other countries have been recorded in the USA in only 60 days (by the time the report was written).

# collapse-hide
# data processing
start_date = datetime.strptime('2020-01-22', '%Y-%m-%d')
for index, row in selected_data.iterrows():
    date = datetime.strptime(row['Date'], '%Y-%m-%d')
    selected_data.loc[index, 'Day'] = (date - start_date).days
    
selected_data['Day'] = selected_data['Day'].astype(int)
# plot
select_date = alt.selection_single(
    name='select', fields=['Day'], init={'Day': 0},
    bind=alt.binding_range(min=0, max=selected_data.Day.max(), step=1)
)
alt.Chart(selected_data, title='COVID-19 Spread Over Time').transform_filter(
    alt.datum.Country != 'Iran').mark_point(filled=True).encode(
    alt.X('Confirmed', scale=alt.Scale(zero=False)),
    alt.Y('Deaths', scale=alt.Scale(zero=False)),
    alt.Size('Active'),
    alt.Color('Country'),
    alt.Order('Confirmed', sort='descending'),
    tooltip = [alt.Tooltip('Confirmed'),
               alt.Tooltip('Deaths'),
               alt.Tooltip('Active')
              ],
).properties(
    width=750,
    height=400
).add_selection(select_date).transform_filter(select_date)

Below it is illustrated how the COVID-19 has been spreaded out among major countries and how they compared to Denmark. In China where the COVID-19 first appeared, shows a high increase in number of cases per day during February and in relatively short period of time archives to diminish those numbers due to strict measurements. The rest of the countries (apart from Denmark) that didn't apply strict measurements on time we observe a high increase in new cases and no significant drop since those numbers reached their peak. In case of the USA and UK these numbers seems to keep inceasing.

# collapse-hide
# plot
interval = alt.selection_interval()

circle = alt.Chart(selected_data, title='Spread and New Cases Over Time').transform_filter(
    alt.datum.Country != 'Iran').mark_circle().encode(
    x='monthdate(Date):O',
    y='Country',
    color=alt.condition(interval, 'Country', alt.value('lightgray')),
    size=alt.Size('New cases:Q',
        scale=alt.Scale(range=[0, 3000]),
        legend=alt.Legend(title='Daily new cases')
    ) 
).properties(
    width=1000,
    height=400,
    selection=interval
)

bars = alt.Chart(selected_data).mark_bar().encode(
    y='Country',
    color='Country',
    x='sum(New cases):Q'
).properties(
    width=1000
).transform_filter(
    interval
)

circle & bars

Death and infection rates

the text will be changed

check this part again. Do you think we should show results for denmark separately? I was thinking to show all the major countried together. Although in this case Denmark has very low number of case compare to other countries.

The figure below illustrates the total confirmed cases and deaths in Denmark from the day the virus appeared in the country (approximately January 22) until April 23. we can either update the date or just stop to the day the lockdown stopped

It seems that even during the lock down (10/03/2020 - 20/03/20) the number of confirmed cases and deaths showed an increasing trend. Although it has to be highlighted that the purpose of lock down was to keep these numbers as low as possible in order not to exceed the capacity of the cases that the health system can handle. maybe we can find data about that and include them to see if they get the target

Moving forward to the figures, it is observed the number of new incidents(3rd pane from left) and deaths (4th pane from the left) in Denmark during the same period.

When the lock down implemented (around 10th of March) and until the 13th of the same month 170 new cases were recorded daily. Whereas on the 14th of March a significant drop of approximatelly 75% of the cases recorded is observed. Between, March the 24th and April the 9th the number of confirmed cases reached its peak with an average of 300 cases per day. Then the recorded cases began to drop again until today where they have reached of an average 150 cases per day(10/05/2020 - 23/05/2020).

The number of daily deaths reached its highest numbers between 3rd and 9th of April and dropped by approximatelly 50% after that. By today the number of deaths per day doesn't exceed the 9 deaths. Overall, we can see that the measurments against the virus yielded in reduction of deaths and confirmed cases after its implementation.

After the investigation onn new cases and deaths we would like to check how the death rate for each of the countries has been formed. The figure below show exactly this. By pointing on each line we can get the exact value of death rate for the major countries. As we can see there is an increasing trend in death rate as the virus spreading out. Chine is the only country that seemed to record a steady state from 12th of March to the 12th of April. Suprisingly the USA has a realtively low death rate if a man consider the high number of incidents that have been recordered the last couple of months. Another, interesting observation is the curve in the first half of March, in case of France, and how it goes up again in period of only 1.5 month. maybe we can include the deaths compare to the population

#collapse-hide
#data preprocessing
#death rate
selected_data['DeathRate'] = (selected_data['Deaths']+selected_data['New deaths'])/(selected_data['Confirmed']+selected_data['New cases']) * 100
selected_data = selected_data.fillna(value=0)

#recovery rate
selected_data['RecoveryRate'] = ((selected_data['Recovered']+selected_data['New recovered'])/(selected_data['Confirmed']+selected_data['New cases']))
selected_data = selected_data.fillna(value=0)


#infection rate
population = {'Denmark':5792202,
             'China':1408526202,
             'France':65273511,
             'Italy':60461826,
             'Spain':46754775,
             'US':331002651,
             'United Kingdom':67886011}


for i in selected_data['Country']:
    for key,value in population.items():
        if i == key:
            selected_data['InfectionRate'] = (selected_data['Confirmed']+selected_data['New cases'])/value * 100

#collapse-hide

# A dropdown filter
countries = list(selected_data.Country.unique())
country_dropdown = alt.binding_select(options=countries)
country_select = alt.selection_single(fields=['Country'], bind=country_dropdown, name="Select")


#plot infection rate
filter_infectionrates = alt.Chart(selected_data, width=300, height=300, title='Infection Rate').mark_line().encode(
    alt.X('Date:T'),
    alt.Y('InfectionRate:Q', title= 'Infection Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('InfectionRate:Q')]
).add_selection(country_select).transform_filter(country_select)



# plot death rate
filter_deathrate = alt.Chart(selected_data, width=300, height=300, title='Death Rate').mark_line().encode(
    alt.X('Date:T'),
    alt.Y('DeathRate:Q', title= 'Death Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('DeathRate:Q')]
).add_selection(country_select).transform_filter(country_select)

# plot infection rate
filter_recovery = alt.Chart(selected_data, width=300, height=300, title='Recovery Rate').mark_line().encode(
    alt.X('Date:T'),
    alt.Y('RecoveryRate:Q', title= 'Recovery Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('RecoveryRate:Q')]
).add_selection(country_select).transform_filter(country_select)


(filter_infectionrates | filter_deathrate | filter_recovery)

Macroeconomic

should we show only for Denmark or globally

In this section we will attempt to perform an economic analysis from a macroeconimic point of view and in relation to the COVID-19 analysis above, we will try to come up with the potential coclusions on how the spread of the virus has affected the global economy. A closer look to Denmark will be given in this section as well. take a look on that again.

Macroeconomics is a branch of economics that studies how an overall economy behaves (focuses on the large scale). More presicely, macroeconomics studies economy-wide phenomena such as inflation, price levels, rate of economic growth, national income, gross domestic product (GDP), and changes in unemployment (Investopedia).

Stock Market

for denmark update all shares and omx20, look again USA i dont know why the shares don't appear

Talk about the stock market

# collapse-hide
#import sectors data
chemicalsdk = pd.read_csv(path+'Copenhagen Chemicals Historical Data.csv')
consumersdk = pd.read_csv(path+'Copenhagen Consumer Goods Historical Data.csv')
servicesdk = pd.read_csv(path+'Copenhagen Consumer Services Historical Data.csv')
financialsdk = pd.read_csv(path+'Copenhagen Financials Historical Data.csv')
healthdk = pd.read_csv(path+'Copenhagen Health Care Historical Data.csv')
industrialsdk = pd.read_csv(path+'Copenhagen Industrials Historical Data.csv')
ogdk = pd.read_csv(path+'Copenhagen Oil & Gas Historical Data.csv')
realdk = pd.read_csv(path+'Copenhagen Real Estate Historical Data.csv')
technologydk = pd.read_csv(path+'Copenhagen Technology Historical Data.csv')



# stock data preprocessing
stockOMX20['Symbol'] = 'OMX 20'
stockCopenhagenAllShare['Symbol'] = 'Copenhagen All Shares'
#stockOMX25['Symbol'] = 'OMX 25'
chemicalsdk['Symbol'] = 'Chemicals'
consumersdk['Symbol'] = 'Consumer Goods'
servicesdk['Symbol'] = 'Consumer Services'
financialsdk['Symbol'] = 'Financials'
healthdk['Symbol'] = 'Health Care'
industrialsdk['Symbol'] = 'Industrials'
ogdk['Symbol'] = 'Oil & Gas'
realdk['Symbol'] = 'Real Estate'
technologydk['Symbol'] = 'Technology'



stockAll = pd.concat([stockOMX20, stockCopenhagenAllShare,chemicalsdk,consumersdk,servicesdk,financialsdk,
                     healthdk,industrialsdk,ogdk,realdk,technologydk])
stockAll['Date'] = pd.to_datetime(stockAll.Date)
stockAll = stockAll.sort_values(by=['Symbol', 'Date'])
stockAll['Price'] = stockAll['Price'].str.replace(',', '')
stockAll['Price'] = stockAll['Price'].astype(float)

#collapse-hide
line = alt.Chart(stockAll).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol:N',
)


nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')



selectors = alt.Chart(stockAll, title='Major Index and Primary Sectors Stocks Price (Denmark) ').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockAll).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)

#collapse-hide
# stock data preprocessing for France
# Top 40 companies in France
stockCAC40 = pd.read_csv(path+'CAC40.csv')

# importing sectors
CACbasic = pd.read_csv(path+'CACBasicMaterials.csv')
CACconsumer=pd.read_csv(path+'CACConsumerGoods.csv')
CACservice =pd.read_csv(path+'CACConsumerService.csv')
CACfinancial =pd.read_csv(path+'CACFinancials.csv')
CACutilities =pd.read_csv(path+'CACUtilities.csv')
CACtech =pd.read_csv(path+'CACTechnology.csv')
CAChealth =pd.read_csv(path+'CACHealthCare.csv')
CACoil =pd.read_csv(path+'CACOil&Gas.csv')
CACindustrial =pd.read_csv(path+'CACIndustrials.csv')
cacall = pd.read_csv(path+'CAC All Shares.csv')
#prepare the data for plotting
stockCAC40['Symbol']='CAC 40'
CACbasic['Symbol'] = 'CAC Basic Materials'
CACconsumer['Symbol'] = 'CAC Consumer Goods'
CACservice['Symbol'] = 'CAC Consumer Services'
CACfinancial['Symbol'] = 'CAC Financials'
CACutilities['Symbol'] = 'CAC Industrials'
CACtech['Symbol'] = 'CAC Technology'
CAChealth['Symbol'] = 'CAC Health Care'
CACoil['Symbol'] = 'CAC Oil & Gas'
CACindustrial['Symbol'] = 'CAC Industrials'
cacall['Symbol'] = 'France All Shares'
stockFRA = pd.concat([stockCAC40,CACbasic,CACconsumer,CACservice,CACfinancial,CACutilities,CACtech,
                     CAChealth,CACoil,CACindustrial,cacall],sort = True)
stockFRA['Date'] = pd.to_datetime(stockFRA.Date)
stockFRA = stockFRA.sort_values(by=['Symbol','Date'])
stockFRA['Price'] = stockFRA['Price'].str.replace(',','')
stockFRA['Price'] = stockFRA['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockFRA).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockFRA, title='Major Index & Primary Sectors Stocks Price(France)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockFRA).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)

#collapse-hide

#importing stocks for Italy
stockMIB = pd.read_csv(path+'FTSE MIB.csv')
utilities = pd.read_csv(path+'FTSE Italia Utilities.csv')
#Telecommunications = pd.read_csv(path+'FTSE Italia Telecommunications.csv')
Technology = pd.read_csv(path+'FTSE Italia Technology.csv')
O_G = pd.read_csv(path+'FTSE Italia Oil & Gas.csv')
Travel = pd.read_csv(path+'FTSE Italia All Share Travel & Leisure.csv')
industrials = pd.read_csv(path+'FTSE Italia All Share Industrials.csv')
financials = pd.read_csv(path+'FTSE Italia All Share Financials.csv')
health = pd.read_csv(path+'FTSE Italia All Share Health Care.csv')
chemicals = pd.read_csv(path+'FTSE Italia All Share Chemicals.csv')
allsharesitalia = pd.read_csv(path+'FTSE Italia All Share.csv')
#prepare data for plotting
stockMIB['Symbol']='MIB'
utilities['Symbol'] = 'FTSE Utilities'
#Telecommunications['Symbol'] = 'FTSE Telecommunications'
Technology['Symbol'] = 'FTSE Technology'
O_G['Symbol'] = 'FTSE Oil & Gas'
Travel['Symbol'] = 'FTSE Travel & Leisure'
industrials['Symbol'] = 'FTSE Industrials'
financials['Symbol'] = 'FTSE Financials'
health['Symbol'] = 'FTSE Health Care'
chemicals['Symbol'] = 'FTSE Chemicals'
allsharesitalia['Symbol'] = 'Italy All Shares'
stockITA = pd.concat([stockMIB,utilities,Technology,O_G,Travel,
                     industrials,financials,health,chemicals,allsharesitalia],sort = True)
stockITA['Date'] = pd.to_datetime(stockITA.Date)
stockITA = stockITA.sort_values(by=['Symbol','Date'])
stockITA['Price'] = stockITA['Price'].str.replace(',','')
stockITA['Price'] = stockITA['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockITA).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockITA, title='Major Index & Primary Sectors Stocks Price(Italy)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockITA).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=500
)

#collapse-hide

#importing stocks for Spain
ibex = pd.read_csv(path+'IBEX 35 Historical Data.csv')
materials= pd.read_csv(path+'Madrid Basic Materials Industry and Construction Historical Data.csv')
consumer = pd.read_csv(path+'Madrid Consumer Goods Historical Data.csv')
service = pd.read_csv(path+'Madrid Consumer Services Historical Data.csv')
financial = pd.read_csv(path+'Madrid Financial Services & Real Estate Historical Data.csv')
petrol = pd.read_csv(path+'Madrid Petrol and Power Historical Data.csv')
technology = pd.read_csv(path+'Madrid Technology and Telecommunications Historical Data.csv')
spainall = pd.read_csv(path+'IBEX MAB All Share Historical Data.csv')

#prepare data for plotting
ibex['Symbol']='IBEX 35'
materials['Symbol'] = 'Basic Materials Industry and Construction'
consumer['Symbol'] = 'Consumer Goods'
service['Symbol'] = 'Services'
financial['Symbol'] = 'Financial Services & Real Estate'
petrol['Symbol'] = 'Petrol and Power'
technology['Symbol'] = 'Technology and Telecommunications'
spainall['Symbol'] = 'Spain All Shares'
health['Symbol'] = 'FTSE Health Care'
chemicals['Symbol'] = 'FTSE Chemicals'
allsharesitalia['Symbol'] = 'Italy All Shares'
stockSP = pd.concat([ibex,materials,consumer,service,financial,petrol,technology,spainall],sort = True)
stockSP['Date'] = pd.to_datetime(stockSP.Date)
stockSP = stockSP.sort_values(by=['Symbol','Date'])
stockSP['Price'] = stockSP['Price'].str.replace(',','')
stockSP['Price'] = stockSP['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockSP).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockSP, title='Major Index & Primary Sectors Stocks Price(Spain)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockSP).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)

#collapse-hide

#importing stocks for the UK
ftse100 = pd.read_csv(path+'FTSE 100 Historical Data.csv')
auto= pd.read_csv(path+'FTSE 350 - Automobiles & Parts Historical Data.csv')
forestry = pd.read_csv(path+'FTSE 350 - Forestry & Paper Historical Data.csv')
metals = pd.read_csv(path+'FTSE 350 - Industrial Metals & Mining Historical Data.csv')
telecom = pd.read_csv(path+'FTSE 350 - Mobile Telecommunications Historical Data.csv')
realestate = pd.read_csv(path+'FTSE 350 - Real Estate Historical Data.csv')
#aerospace = pd.read_csv(path+'FTSE 350 Aerospace & Defense Historical Data.csv')
beverage = pd.read_csv(path+'FTSE 350 Beverages Historical Data.csv')
chemicalsuk = pd.read_csv(path+'FTSE 350 Chemicals Historical Data.csv')
construction = pd.read_csv(path+'FTSE 350 Construction & Building Materials Historical Data.csv')
ukall = pd.read_csv(path+'FTSE All-Share Historical Data.csv')

#prepare data for plotting
ftse100['Symbol']='FTSE 100'
auto['Symbol'] = 'Automobiles & Parts'
forestry['Symbol'] = 'Forestry & Paper'
metals['Symbol'] = 'Industrial Metals & Mining'
telecom['Symbol'] = 'Mobile Telecommunications'
realestate['Symbol'] = 'Real Estate'
#aerospace['Symbol'] = 'Aerospace & Defense'
beverage['Symbol'] = 'Beverages'
ukall['Symbol'] = 'United Kingdom All Shares'
chemicalsuk['Symbol'] = 'Chemicals'
construction['Symbol'] = 'Construction & Building Materials'

stockUK = pd.concat([ftse100,auto,forestry,metals,telecom,realestate,beverage,chemicalsuk,construction,ukall],sort = True)
stockUK['Date'] = pd.to_datetime(stockUK.Date)
stockUK = stockUK.sort_values(by=['Symbol','Date'])
stockUK['Price'] = stockUK['Price'].str.replace(',','')
stockUK['Price'] = stockUK['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockUK).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockUK, title='Major Index & Primary Sectors Stocks Price (UK)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockUK).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=500
)

#collapse-hide

#importing stocks for the USA
dow30 = pd.read_csv(path+'Dow Jones Industrial Average Historical Data.csv')
SP = pd.read_csv(path+'S&P 500 Historical Data.csv')

nasdaq = pd.read_csv(path+'NASDAQ Composite Historical Data.csv')
#bious= pd.read_csv(path+'NASDAQ Biotechnology Historical Data.csv')
banksus = pd.read_csv(path+'NASDAQ Bank Historical Data.csv')
financialsus = pd.read_csv(path+'NASDAQ Financial 100 Historical Data.csv')
#healthus = pd.read_csv(path+'NASDAQ Health Care Historical Data.csv')
industrialsus = pd.read_csv(path+'NASDAQ Industrial Historical Data.csv')
insuranceus = pd.read_csv(path+'NASDAQ Insurance Historical Data.csv')
#internetus = pd.read_csv(path+'NASDAQ Internet Historical Data.csv')
computersus = pd.read_csv(path+'NASDAQ Computer Historical Data.csv')
telecomus = pd.read_csv(path+'NASDAQ Telecommunications Historical Data.csv')
transportationus = pd.read_csv(path+'NASDAQ Transportation Historical Data.csv')

#prepare data for plotting
dow30['Symbol']='Dow 30'
SP['Symbol'] ='S&P 500'
nasdaq['Symbol'] ='NASDAQ'
#bious['Symbol'] = 'Biotechnology'
banksus['Symbol'] = 'Banks'
financialsus['Symbol'] = 'Financials'
#healthus['Symbol'] = 'Health Care'
industrialsus['Symbol'] = 'Industrials'
insuranceus['Symbol'] = 'Insurance'
#internetus['Symbol'] = 'Internet'
computersus['Symbol'] = 'Computers'
telecomus['Symbol'] = 'Telecommunications'
transportationus['Symbol'] = 'Transportation'

stockUS = pd.concat([dow30,SP, nasdaq,banksus,financialsus,industrialsus,
                     insuranceus,computersus,
                    telecomus,transportationus],sort = True)
stockUS['Date'] = pd.to_datetime(stockUS.Date)
stockUS = stockUS.sort_values(by=['Symbol','Date'])
stockUS['Price'] = stockUS['Price'].str.replace(',','')
stockUS['Price'] = stockUS['Price'].astype(float)
stockUS.head()
Change % Date High Low Open Price Symbol Vol.
273 2.43% 2019-04-01 3,615.4 3,555.3 3,556.4 3613.8 Banks -
272 -0.25% 2019-04-02 3,627.6 3,596.6 3,605.9 3604.9 Banks -
271 0.20% 2019-04-03 3,651.3 3,604.5 3,638.1 3612.0 Banks -
270 1.25% 2019-04-04 3,661.7 3,611.2 3,612.1 3657.0 Banks -
269 0.51% 2019-04-05 3,676.8 3,646.4 3,661.2 3675.7 Banks -

# collapse-hide
line = alt.Chart(stockUS).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockUS, title='Major Index & Primary Sectors Stocks Price (USA)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockUS).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300)

#collapse-hide

#importing stocks for the USA
shanghai = pd.read_csv(path+'Shanghai Composite Historical Data.csv')
szse= pd.read_csv(path+'SZSE Component Historical Data.csv')
#oilch = pd.read_csv(path+'FTSE China - Oil Equipment Services & Distribution Historical Data.csv')
banksch = pd.read_csv(path+'FTSE China A 600 - Banks Historical Data.csv')
#electricitych = pd.read_csv(path+'FTSE China A 600 - Electricity Historical Data.csv')
financialsch = pd.read_csv(path+'FTSE China A 600 - Financials Historical Data.csv')
gwch = pd.read_csv(path+'FTSE China A 600 - Gas & Water Multiutilities Historical Data.csv')
retailersch = pd.read_csv(path+'FTSE China A 600 - General Retailers Historical Data.csv')
lifeinsurancech = pd.read_csv(path+'FTSE China A 600 - Life Insurance Historical Data.csv')
mediach = pd.read_csv(path+'FTSE China A 600 - Media Historical Data.csv')
realestatech = pd.read_csv(path+'FTSE China A 600 - Real Estate Investment & Services Historical Data.csv')
scch = pd.read_csv(path+'FTSE China A 600 - Software & Computer Services Historical Data.csv')


#prepare data for plotting
shanghai['Symbol']='Shanghai Composite'
szse['Symbol'] = 'SZSE Component'
#oilch['Symbol'] = 'Oil Equipment Services & Distribution'
banksch['Symbol'] = 'Banks'
#electricitych['Symbol'] = 'Electricity'
financialsch['Symbol'] = 'Financials'
gwch['Symbol'] = 'Gas & Water'
retailersch['Symbol'] = 'General Retailers'
lifeinsurancech['Symbol'] = 'Life Insurance'
mediach['Symbol'] = 'Media'
realestatech['Symbol'] = 'Real Estate Investment & Services'
scch['Symbol'] = 'Software & Computer Services'


stockCH = pd.concat([shanghai,szse,banksch,financialsch,gwch,retailersch,lifeinsurancech,
                    mediach,realestatech,scch],sort = True)
stockCH['Date'] = pd.to_datetime(stockCH.Date)
stockCH = stockCH.sort_values(by=['Symbol','Date'])
stockCH['Price'] = stockCH['Price'].str.replace(',','')
stockCH['Price'] = stockCH['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockCH).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockCH, title='Major Index & Primary Sectors Stocks Price (China)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockCH).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=500)

**i was thinking something like that for the stocks to have a drop down menu or a box filter(like the one on the left) where we select the country and then illustrate the graph as the one above with indices and primary sectors to make it more informative.

#collapse-hide

stockAll['Country']='Denmark'
stockFRA['Country']='France'
stockITA['Country']='Italy'
stockSP['Country']='Spain'
stockUK['Country']='UK'
stockUS['Country']='USA'

stocks = pd.concat([stockAll,stockFRA,stockITA,stockSP,stockUK,stockUS],sort = True)

#dropdown
countries = list(stocks.Country.unique())
country_dropdown = alt.binding_select(options=countries)
country_select = alt.selection_single(fields=['Country'], bind=country_dropdown, name="Select")


line = alt.Chart(stocks, title='Major Index & Primary Sectors Stocks Price (Major Countries)').mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
    tooltip = [alt.Tooltip('Price:Q')]
).properties(width=600, height=500).add_selection(country_select).transform_filter(country_select)

line
make = pd.DataFrame({'name': ['Honda', 'Ford', 'Dodge']})
fuel = pd.DataFrame({
    'Honda': [9, 8, 8, 7, 7],
    'Ford': [5, 4, 3, 2, 1],
    'Dodge': [6, 5, 5, 3, 4]
}).reset_index().melt(id_vars=['index'], var_name='name', value_name='fuel')

selection = alt.selection_multi(fields=['name'])


color = alt.condition(selection, alt.Color('name:N',legend=None), alt.value('lightgray'))

make_selector = alt.Chart(make).mark_rect().encode(y='name', color=color).add_selection(selection)


fuel_chart = alt.Chart(fuel).mark_line().encode(x='index', y=alt.Y('fuel', scale=alt.Scale(domain=[0, 10])), color='name').transform_filter(selection)

make_selector | fuel_chart
stocks.Symbol.unique()

GDP Inflation & unemployment data

Major countrys' GDP Inflation and unemployment annual change rate data from IMF includes forecast of 2020 and 2021

I have removed Germany because we are not icluding it in the analysis above and now we need data if possible for the UK

# collapse-hide
# data preprocessing
def extract_data(df, subject):
    dates = ['2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']
    d = {'Date': dates, 'Value': [df[date] for date in dates]}
    values = []
    countries = []
    _dates = []
    for country in df.Country.unique():
        tmp = df.loc[df.Country == country]
        for date in dates:
            countries.append(country)
            _dates.append(date)
            values.append(float(tmp[date]))
    
    rv = pd.DataFrame.from_dict({'Date': _dates, 'Country': countries, 'Value': values})
    rv['subject'] = subject
    return rv

unemploy = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Unemployment rate']
unemploy = extract_data(unemploy[unemploy.Country != 'Germany'], 'unemployment')
inflation = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Inflation, average consumer prices']
inflation = extract_data(inflation[inflation.Country != 'Germany'], 'inflation')
gdp = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Gross domestic product, constant prices']
gdp = extract_data(gdp[gdp.Country != 'Germany'], 'gdp')

# A dropdown filter
countries = list(majorCountry.Country.unique())
country_dropdown = alt.binding_select(options=countries)
country_select = alt.selection_single(fields=['Country'], bind=country_dropdown, name="Select")

filter_gdp = alt.Chart(gdp, width=300, height=300, title='GDP Growth of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)

# umemployment plot
filter_unemployment = alt.Chart(unemploy, width=300, height=300, title='Unemployment Change of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)

# inflation plot
filter_inflation = alt.Chart(inflation, width=300, height=300, title='Inflation Change of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)


(filter_gdp | filter_unemployment | filter_inflation)

References

  1. Investopedia